Scheduling PID Attitude and Position Control Frequencies for Time-Optimal Quadrotor Waypoint Tracking under Unknown External Disturbances

Recently, the use of quadrotors has increased in numerous applications, such as agriculture, rescue, transportation, inspection, and localization. Time-optimal quadrotor waypoint tracking is defined as controlling quadrotors to follow the given waypoints as quickly as possible. Although PID control is widely used for quadrotor control, it is not adaptable to environmental changes, such as various trajectories and dynamic external disturbances. In this work, we discover that adjusting PID control frequencies is necessary for adapting to environmental changes by showing that the optimal control frequencies can be different for different environments. Therefore, we suggest a method to schedule the PID position and attitude control frequencies for time-optimal quadrotor waypoint tracking. The method includes (1) a Control Frequency Agent (CFA) that finds the best control frequencies in various environments, (2) a Quadrotor Future Predictor (QFP) that predicts the next state of a quadrotor, and (3) combining the CFA and QFP for time-optimal quadrotor waypoint tracking under unknown external disturbances. The experimental results prove the effectiveness of the proposed method by showing that it reduces the travel time of a quadrotor for waypoint tracking.


Introduction
In recent years, the use of quadrotor has rapidly increased in various fields, including agriculture [1,2], rescue [3,4], delivery [5], inspection [6], and map construction [7]. A quadrotor, or drone, shows a versatile ability for various kinds of missions because of the simple structure, the simple use of control, and useful properties, such as VTOL (vertical take-off and landing).
Quadrotor waypoint tracking is defined as controlling quadrotors to follow the given waypoints. To complete time-critical missions, such as rescue and delivery, time-optimal quadrotor waypoint tracking is essential. Although PID control [8] is widely used for controlling robots, including quadrotors [9][10][11], it is hardly generalizable to various environments, such as various trajectories and dynamic external disturbances, which cannot be precisely modeled due to the uncertainties [12]. Existing PID control methods for quadrotors use a fixed position control frequency, where the position errors are fed to the PID periodically. However, in this work, we discover that different trajectories require different control frequencies to achieve time-optimal tracking control. We also show that different control frequencies are required as the external disturbances change.
There have been numerous efforts for adaptive control in environmental changes, including cascade control [13,14], finite-time control [15,16], coordinated control [17,18], PID gain scheduling [19,20], backstepping control [21,22], sliding mode control [23,24], external disturbance estimation [25,26], and compensation [27,28]. However, no existing studies consider finding the best control frequencies for time-optimal tracking control. Therefore, we suggest a method for scheduling PID control frequencies for timeoptimal quadrotor waypoint tracking in various environments under unknown external disturbances. First, we propose the Control Frequency Agent (CFA), a deep reinforcementlearning-based model to schedule the ratio of position and attitude control frequencies.
Given the information at the current timestep, the CFA finds the best control frequencies for time-optimal waypoint tracking in various trajectories. Secondly, we suggest the Quadrotor Future Predictor (QFP), a neural-network-based model that predicts the next state of a quadrotor. Given the current state and action of a quadrotor, QFP predicts the next state of the quadrotor. Finally, we propose a way to compensate the external disturbances by combining CFA and QFP. Specifically, the current external disturbances are estimated by QFP, and CFA schedules control frequencies given the estimates as additional information. The proposed method is generalizable to various trajectories under various unknown external disturbances. To the best of our knowledge, none of the existing studies deal with scheduling PID control frequencies for improving quadrotor waypoint tracking control.
To evaluate the effectiveness of the proposed method, we make comparisons with a conventional PID control method, whose attitude and position control frequencies are fixed. For experiments, we use various trajectories with and without external disturbances. The methods are evaluated in terms of the total travel time taken to follow the given waypoints. The experimental results show that the proposed method outperforms the conventional PID controller, which uses a fixed control frequency, in waypoint tracking under external disturbances.
Our contributions are as follows: (1) discovering that the optimal control frequencies may vary in different environments, (2) a deep reinforcement-learning-based Control Frequency Agent (CFA) that adjusts the control frequencies of a quadrotor for time-optimal quadrotor waypoint tracking in various environments, (3) a neural-network-based Quadrotor Future Predictor (QFP) that estimates the next state of a quadrotor, and (4) the combination of CFA and QFP for time-optimal quadrotor waypoint tracking under various unknown external disturbances.

Related Work
PID control [8] is generally used in many industries because of its simplicity [9,11]. However, conventional PID control may not be applicable under dynamic external forces. There have been numerous studies for adaptive control in dynamic environmental changes, such as cascade control [13,14], finite-time control [15,16], coordinated control [17,18], PID gain scheduling [19,20], backstepping control [21,22], sliding mode control [23,24], external force estimation [25,26], and compensation [27,28]. However, none of the studies consider scheduling control frequencies. Since this paper is focused on finding the optimal control frequencies in different environments, the existing methods are not directly comparable. Instead, the proposed method can be combined with the existing methods for further improvements.
One of the prominent approaches for handling unknown external disturbances is estimating external disturbances [25,26] and compensating the estimated disturbance [21,27,28]. Similarly, we train the external force estimator to predict the next state of a quadrotor given the current state and action. Then, we compensate the estimated external disturbance by scheduling PID attitude and position control frequencies. Although we follow the model architecture that consists of an external force estimator and a compensator, none of the existing studies compensate the estimated force by adjusting PID control frequencies.

Quadrotor Dynamics
The kinematics of a quadrotor can be described in an inertial frame A with a triad a 1 , a 2 , and a 3 and a body frame B with triad b 1 , b 2 , and b 3 . The geometric view of two frames is presented in Figure 1. The quadrotor has 6 degrees of freedom in the position ξ = [x, y, z] and the orientation η = [φ, θ, ψ], where ξ is the position of the body frame with respect to the inertial frame A and η is the rotation coordinates of the body frame with respect to the inertial frame. These values are denoted as roll, pitch, and yaw [29].   We use the Z − X − Y Euler angle convention to model the rotation of a quadrotor in the inertial frame, following [29]. By Newton's equations of motion, we have where m is the mass of the quadrotor, g is the gravity, and A R B is the rotation matrix from A to B. Each F i is a vertical force produced by each rotor of the quadrotor with angular velocity w i . A rotor also produces a moment M i .
Here, k F and k M are constants that should be experimentally tuned. By summing the forces, we can control the acceleration of the quadrotor. Therefore, we have the first input u 1 Rotor 1 and 3 rotate in the −b 3 direction and produce moment M 1 and M 3 , whose directions are opposite to the direction of rotation. Furthermore, rotor 2 and 4 rotate in the b 3 direction and produce moment M 2 and M 3 . M 1 and M 3 act in the b 3 direction, while M 2 and M 4 act in the −b 3 direction. By Euler's equations of motion, the angular acceleration is defined as where × is the outer product, L is the length of the blades, and I is the inertial matrix of the quadrotor. We have the second input vector u 2 using Equations (2) and (3) as follows.
The control of a quadrotor can be separated into the position and the attitude control. In general, attitude control is ≈10 times more frequent than the attitude control [30]. The implementation of the stable control keeps the attitude of the drone as [0, 0, ψ] while controlling the position of the drone to follow the reference trajectory. The hover configuration is given as ξ = ξ 0 , θ = φ = 0,ξ = 0 andφ =ψ =θ = 0. In this state, the force for each robot is

PID Control
When there is an error e(t) at a time step t between the sensed position y(t) and the desired position r(t), the controller calculates the motor speed to reduce the error to 0. When the control only uses the current error, it usually fails to reach the desired setpoint, and thus, integral and derivative terms are additionally required. A Proportional-Integral-Derivative (PID) controller has three gains. The proportional term is used to reduce the currently observed error. The integral term is used to accumulate errors to reach the desired position faster. The derivative term is used to stabilize the control. The overall PID control process is shown in Figure 2. The basic PID controller. The error term e t is the difference between the desired position r(t) and the sensed position y(t). Each PID term is computed based on the current and past errors. The force of each rotor is computed from Equations (4) and (6). After processing the input, we have the current output y(t). This feedback loop decreases e t gradually.

Position and Attitude Control
For the quadrotor control, the position and attitude control can be separated. Therefore, it is possible to run the attitude control more frequently than the position control using the nested feedback loops. An example of the feedback loop is illustrated in Figure 3. Next, the position controller produces a control input u1(t) and determines the desired roll, pitch, and yaw angles. Then, the attitude controller produces a control input u2(t) given the desired angles. The control inputs u1(t) and u2(t) are fed to the drone input processor to control the rotors.

Drone Input Processor
The position control is to follow the reference trajectory in three dimensions. Additionally, the desired yaw angle can be specified independently. For position control, PID coefficients must be properly tuned for each dimension and yaw angle. Then, the position control algorithm will produce the desired roll and pitch angles.
The attitude control is to track a trajectory following the desired roll, pitch, and yaw angles. The attitude should approach the nominal hover state, where the roll and pitch are near zero. PID coefficients must be properly determined for each of the raw, pitch, and yaw angles.

Method
In this section, we propose a method to schedule PID control frequencies for timeoptimal quadrotor waypoint tracking under unknown external disturbances. The proposed method includes (1) Figure 4. The proposed QFP and CFA models. QFP is trained to predict the next position of a quadrotor controlled by PID in the environment without external effects. After training, the next state estimated by QFP is used as the input of CFA. Then, CFA is trained to adjust the ratio of the position and attitude control frequencies against external effects by maximizing the given reward function.
The approximated external effect is computed as the difference between the current state (in a red dashed box) and the estimated state (in a green dashed box). CFA is trained to balance the position and attitude control frequencies for time-optimal waypoint tracking control.

Control Frequency Agent
A quadrotor mainly has two controllers, an attitude controller and a position controller. The inner attitude control loop controls the Euler angles and the outer position control loop controls the trajectory in three dimensions. In general, the attitude control frequency is 5∼10 times larger than the position control frequency [29]. As the attitude control frequency becomes larger than the position control frequency, the quadrotor flies more stably, but it follows the target position slowly. On the other hand, as the position control frequency becomes larger than the attitude control frequency, the flight becomes unstable, so it may fall. In addition, we find that the optimal control frequencies may vary according to the environmental changes. Therefore, finding the optimal ratio of control frequencies is essential for time-optimal control in quadrotor waypoint tracking.
Therefore, we propose a Control Frequency Agent (CFA), which is a deep reinforcementlearning-based model that schedules the ratio of control frequencies of the position and attitude of the quadrotor. We define the observation as the current state of a quadrotor, which includes rotations in roll, pitch, yaw angles, a linear and angular velocity vector, and the motors' speeds. The action space is defined as a discrete action space with three possible actions: (1) increase the attitude frequency, (2) decrease the attitude frequency, and (3) keep the current frequency ratio. The reward is the sparse reward and the agent obtains 10/(1 + reach time) when it reaches the last waypoint. We use a conventional implementation of the on-policy model for the CFA model. Although we adopt the Proximal Policy Gradient [31] for its simplicity, any deep reinforcement learning models can be adopted. The model is trained to maximize the reward, which is maximized when the total travel time taken to follow the reference waypoints is minimized.

Quadrotor Future Predictor
Although PID control is simple and effective, the input of the controller is derived from the linear equation of quadrotor's kinematic information, not considering external forces. Therefore, we suggest a way to separate the total setpoint error in the error caused by the PID controller and the error caused by the external effects. Let SP t and CP t be the set point and the current position of the quadrotor at a timestep t. The PID controller computes the target RPMs for each rotor, and the quadrotor moves to the next position CP t+1 . Here, the error is defined as E t = SP t − CP t+1 if and only if there is no external effect. However, when there are external effects, the current position CP t+1 becomes CP w t+1 = CP t+1 + t , where t is the additional movement of the drone by external effects. The error E w t in the environment with the external effects can be written as Therefore, the error term can be decomposed into the error term E t produced by the PID control and the error t produced by the external effects. Even though PID control is simple, predicting the next position of the quadrotor controlled by the PID control is not trivial. This is because PID control is dependent not only on the current position but also on the past information of the control.
Based on the setpoint error decomposition defined above, we propose the Quadrotor Future Predictor (QFP), which predicts the next state of the drone based on (1) the kinematic information of the quadrotor, including CP t and SP t , (2) the current PID errors before multiplying gains, and (3) the gain for each PID term. The model is trained to predict the next state of the drone, which includes the next position CP t+1 and the Euler angles. Specifically, the training samples are generated by moving to a random position near the current position, where the random position is within the range of where the quadrotor can move in a single timestep. The model is composed of three feedforward layers with a ReLU activation function in-between. Batch normalization and dropout are applied after the first and second layers. The observation of the QFP is defined as the current state of a quadrotor, including roll, pitch, yaw angles, the linear and angular velocity vector, and the motors' speeds. The QFP estimates a relative vector of the next position from the current position. Specifically, the QFP model is trained with mean squared error (MSE) loss to predict the next position CP t+1 of the quadrotor based on the current state information.
Since CP w t+1 is an observed value, we can approximate t ≈ CP w t+1 − CP t+1 . Accordingly, we can approximate the external effect with the QFP model. Note that the model is trained in the environment where there is no external effect.

Trajectory Tracking with CFA + QFP
After training both the QFT and CFA models, it is possible to use a feedback loop that combines the QFP, CFA, and a conventional PID controller for waypoint tracking. The overall procedure is illustrated in Figure 5. Given the current state of a drone and the motor RPMs computed by the PID controller, the QFP model predicts the next state and approximates the external disturbances. Then, given the estimated external effects as additional information, the CFA model adjusts the position and attitude control frequencies periodically. Then, the PID controller runs with the adjusted control frequencies.

Position Controller
Attitude Controller

Experiments
We use the the gym-pybullet-drones [32] drone simulation platform to conduct experiments. It is an open-source, OpenAI Gym-like [33] multi-quadrotor simulator based on the Pybullet Physics [34]. We use the Bitcraze Crazyfile 2.x, which is one of the default quadrotor models.
The QFP model comprises three layers of a fully connected neural network with ReLU activation. The sizes of the hidden dimensions are 16, 32, and 64, respectively, for each layer. To train the QFP model, we use the 4803 samples for training and 24,019 samples for validation that are generated by random moves. For hyper-parameters, we use the Adam optimizer with the learning rate of 0.001, the dropout probability of 0.2, the batch size of 256, and the number of epochs as 100. The CFA model is composed of three fully connected layers with ReLU activation. The sizes of the hidden dimensions are tested with 256, 256, and 256, respectively, for each layer. For training the CFA model, we use the batch size of 256, the horizon size of 4000, the lambda as 0.99, the gamma as 0.99, the learning rate of 0.00001, and the number of epochs as 1000. Each episode is 10 seconds long at maximum. We compare the proposed model (CFA+QFA) with the conventional PID baseline and the CFA without QFA model (CFA-QFA). The CFA-QFA adjusts the position control frequency without information about the external effects estimated by the QFA model. By comparing the CFA+QFA model with the CFA-QFA model, we verify the effectiveness of the QFA model. Finally, we show the effectiveness of combining the QFP and CFA models by comparing the conventional PID and the CFA+QFA model.
We first empirically show that the optimal PID position frequency may vary for different trajectory types. We show the results on the four trajectory types shown in Figure 6. The first three of them are 2D movements, and the last one is 3D. The distances between waypoints are equally distributed. When the quadrotor is close enough to the current target waypoint, the target waypoint is set to the next waypoint. Next, we measure the prediction performance of the QFP model. We then compare the performance of the CFA+QFP, CFA-QFP, and PID models in waypoint tracking in various environments.

Rectangle
Circle Up Zigzag

Results
First, we test the effect of the position control frequencies for the four different trajectory types. The results are shown in Figure 7. We discover that the optimal control frequencies vary for different trajectories. Specifically, the rectangle and zigzag types require smaller position control frequencies than the circle and up types to minimize the reach time. In addition, there are some cases where the position control frequency significantly affects the reach time, as shown in the third row (zigzag).  Next, we show the results of the QFP performance for four randomly generated trajectories in Figure 8. The blue line indicates the real trajectory, and the orange line indicates the predicted trajectory. We observe that the QFP model perfectly predicts the direction of the next position, whereas there are some errors in the magnitude. However, it is sufficient statistics because it is important for a quadrotor to know the direction of external forces, such as drag and wind.
Then, to verify the effectiveness of the proposed method (CFA+QFP) under unknown external disturbances, we test the performance of the CFA+QFP model under different wind types. Specifically, the results on the up-trajectory with two different types of wind are shown in Figures 9 and 10. We compare the proposed method (CFA+QFP) with the conventional PID controller and CFA-QFP, which uses the CFA model without the external effects estimated by the QFP model. Figure 9 shows the results of the trajectory and the reach time of the last waypoint. The first wind type is a circular wind type, which is not trivial to estimate the external effects. The second wind type is a linear wind type, which is simpler than the first one. The results show that the proposed method outperforms the conventional PID controller by scheduling the PID control frequencies. We also observe that the CFA performs worse than the conventional PID controller without the QFP model. This stipulates that the external effects estimated by the QFP model are essential for finding the optimal control frequency against the external disturbances. Additionally, we observe that the performance gap is larger in the first wind type, which is more complex. This indicates that the proposed model is more effective as the environment changes become more complex. For more detailed analysis, Figure 10 shows the XYZ-positions and raw, pitch, and yaw angles on the first wind type. We see that the CFA-QFP model fails to find the optimal control frequency without the information estimated by the QFP model, and so, it fails to stably follow the waypoints at some point. We also show the average reward graph for two models for episodes in Figure 11. Although the reward of the CFA-QFP increases, there is a gap between the reward of CFA+QFP and that of CFA-QFP. This is because the estimated external effects are essential information to find the optimal control frequency.  reward + QFP -QFP Figure 11. The reward graph of the CFA+QFP and CFA-QFP models. The reward graph shows that the disturbance information predicted from the QFA is useful to the CFA model.

Conclusions
In this work, we discover that the optimal control frequencies may vary in different environments. Accordingly, we suggest a method for scheduling PID control frequencies for time-optimal quadrotor waypoint tracking in various environments under unknown external disturbances. The proposed method is composed of (1) the Control Frequency Agent (CFA), which adjusts the PID control frequencies according to environmental changes based on deep reinforcement learning, and (2) the Quadrotor Future Predictor (QFP), which estimates the external disturbances based on neural networks. Combining the CFA and QFP, the proposed method finds the best PID control frequencies for time-optimal quadrotor waypoint tracking under various environmental changes. The experimental results verify that the proposed method outperforms the conventional PID controller in waypoint tracking in various environments.

Conflicts of Interest:
The authors declare no conflict of interest.